The most familiar application of aggregation is calculating a sum. Here’s a sequential version.
double[] sequence = ...
double sum = 0.0d;
for (int i = 0; i < sequence.Length; i++)
{
sum += Normalize(sequence[i]);
}
return sum;
This is a typical sequential for loop. In this example and the ones that follow, Normalize
is a user-provided method that transforms the input values in some way,
such as converting them to an appropriate scale. The result is the sum
of the transformed values.
The Microsoft® .NET Framework
Language Integrated Query (LINQ) provides a very simple way to express
this kind of aggregation. Languages such as C#, F#, and Microsoft
Visual Basic® development system provide special syntax for LINQ. The
following LINQ expression calculates a sum.
double[] sequence = ...
return (from x in sequence select Normalize(x)).Sum();
The LINQ expression is a sequential operation whose performance is comparable to the sequential for loop shown in previous example.
To convert a LINQ-to-Objects expression into a parallel query is extremely easy. The following code gives an example.
double[] sequence = ...
return (from x in sequence.AsParallel()
select Normalize(x)).Sum();
If you invoke the AsParallel
extension method, you’re instructing the compiler to bind to PLINQ
instead of to LINQ. The program will use the parallel versions of all
further query operations within the expression. The Sum
extension method executes the query and (behind the scenes and in
parallel) calculates the sum of the selected, transformed values.
This example uses
addition as the underlying aggregation operator, but there are many
others. For example, PLINQ has built-in standard query operators that
count the number of elements and calculate the average, maximum, or
minimum. PLINQ also has operators that create and combine sets
(duplicate elimination, union, intersection, and difference), transform
sequences (concatenation, filtering, and partitioning) and group
(projection). These standard query operators are sufficient for many
types of aggregation tasks, and with PLINQ they all can efficiently use
the hardware resources of a multicore computer.
If PLINQ’s standard query operators aren’t what you need, you can also use the Aggregate extension method to define your own aggregation operators. Here’s an example.
double[] sequence = ...
return (from x in sequence.AsParallel() select Normalize(x))
.Aggregate(1.0d, (y1, y2) => y1 * y2);
This code shows one of the overloaded versions of the Aggregate
extension method. It applies a user-provided transformation to each
element of the input sequence and then returns the mathematical product
of the transformed values.
PLINQ is usually the recommended approach whenever you need to apply the parallel
aggregation pattern to .NET applications. Its declarative nature makes
it less prone to error than other approaches, and its performance on
multicore computers is competitive with them. Implementing parallel
aggregation with PLINQ doesn’t require adding locks in your code.
Instead, all the synchronization occurs internally, within PLINQ.
If PLINQ doesn’t meet your needs or if you prefer a less declarative style of coding, you can also use Parallel.For or Parallel.ForEach to implement the parallel aggregation pattern. The Parallel.For and Parallel.ForEach methods require more complex code than PLINQ. For example, the Parallel.ForEach
method requires your code to include synchronization primitives to
implement parallel aggregation.